Skip to content

Possible k8s OOM Kill prevention pill 2 - rlimit#1370

Draft
taylordowns2000 wants to merge 4 commits intomainfrom
memory-concept-2
Draft

Possible k8s OOM Kill prevention pill 2 - rlimit#1370
taylordowns2000 wants to merge 4 commits intomainfrom
memory-concept-2

Conversation

@taylordowns2000
Copy link
Copy Markdown
Member

@taylordowns2000 taylordowns2000 commented Apr 13, 2026

Background: It took me about 20 seconds to crash a staging worker in Kubernetes:

image image

The above run will show up as "lost" in the next 30 minutes.


This PR uses prlimit to set RLIMIT_AS on each forked child process, capping virtual address space so a runaway
run crashes itself instead of OOM-killing the pod.

It's opt-in by detection: active when prlimit (from util-linux) is available on Linux; no-op on macOS / local dev, and it adds util-linux to the worker Docker image so it's available

Testing on staging

  1. Deploy this branch to a worker connected to app.staging.openfn.org
  2. Create a workflow with a job that spikes memory beyond the configured limit, e.g.:
  fn(state => {
    const arr = [];
    while (true) { arr.push(new Array(1e6).fill('x')); }
    return state;
  });
  1. Run the workflow and confirm:
  • The run fails with an OOM error (not a pod restart)
  • Other concurrent runs on the same worker are unaffected
  • The worker recovers and picks up new runs normally
  1. Check worker logs for cgroup memory enforcement enabled at startup and killed by SIGKILL (probable OOM) on the failing run
  2. Verify no leftover openfn-worker-* directories under the cgroup root after the run completes

AI Usage

Please disclose whether you've used AI anywhere in this PR (it's cool, we just
want to know!):

  • I have used Claude Code
  • I have used another model
  • I have not used AI

You can read more details in our
Responsible AI Policy

Release branch checklist

Delete this section if this is not a release PR.

If this IS a release branch:

  • Run pnpm changeset version from root to bump versions
  • Run pnpm install
  • Commit the new version numbers
  • Run pnpm changeset tag to generate tags
  • Push tags git push --tags

Tags may need updating if commits come in after the tags are first generated.

@github-project-automation github-project-automation bot moved this to New Issues in Core Apr 13, 2026
@taylordowns2000 taylordowns2000 changed the title cgroup Possible k8s OOM Kill prevention via cgroup - pill 2 Apr 13, 2026
@taylordowns2000 taylordowns2000 changed the title Possible k8s OOM Kill prevention via cgroup - pill 2 Possible k8s OOM Kill prevention pill 2 - cgroup Apr 13, 2026
@taylordowns2000 taylordowns2000 changed the title Possible k8s OOM Kill prevention pill 2 - cgroup Possible k8s OOM Kill prevention pill 2 - rlimit Apr 13, 2026
@josephjclark
Copy link
Copy Markdown
Collaborator

Gosh there's a lot of stuff here, and I have no idea what any of it does. I'll take a close look at it (probably tomorrow)

Copy link
Copy Markdown
Collaborator

@josephjclark josephjclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a lot more comfortable with this approach over the cgroup stuff. This feels more direct and targeted at the use-case we need.

I still want to read a little more about it, and I'll make a few changes to the PR.

The big thing I'm concerned about is: will the engine be able to set the correct exit condition when rlimit kills the process? I need to test that locally

const hasPrlimit = detectPrlimitSupport(logger);

if (hasPrlimit && options.maxWorkerMemoryMb) {
logger.info(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would append this to the previous startup log - it'll be way more useful there


type WorkerOptions = {
maxWorkers?: number;
maxWorkerMemoryMb?: number; // kernel-level memory limit per child process (cgroup v2)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just re-use the existing memoryLimitMb option

How that limit gets used in rgroups or heap memory settings or whatever is an implementation detail. The admin just needs to say "don't let any given job take more than 500mb"

allWorkers[child.pid!] = child;

if (hasPrlimit && options.maxWorkerMemoryMb) {
// RLIMIT_AS counts virtual address space, not RSS.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't belong here. We should just pass the mb limit into the applyMemoryLimit function

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd also say: take the memory limit used in the run, add 10%, (20%?) and set that as the hard process limit.

I don't really know why - I just feel like like we should let node control the exit itself, and use limit as a hard fallback

execFileSync('prlimit', [
'--pid',
String(pid),
`--as=${limitBytes}:${limitBytes}`,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to look into:

  • should the soft and hard limit be the same?
  • should we be setting address space or RSS? or both?

@@ -0,0 +1,51 @@
import test from 'ava';
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't know what these tests are going to tell us. Maybe this is something to do at the integration test level. Maybe it's more appropriate that we don't test this at all?

@taylordowns2000
Copy link
Copy Markdown
Member Author

Given the early success of "pill 1", I'm happy to close this. Whatcha think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: New Issues

Development

Successfully merging this pull request may close these issues.

2 participants